The data set 37-00049_UOF-P_2016_prepped contains information on Use of Force incidents by the police in Dallas, Texas in 2016. It includes details such as date, time, location, officer and suspect information, as well as type of force used, reason for use of force, and injury severity. Demographic data on suspects (race, gender, age) and officers (race, gender, years of experience) is also provided. Based on this data set, I have analyzed this data by Barplots , histograms , pie chart, line chart, scatter plot etc.
Dallas_2016_ds = read.csv("E:\\DV_MA304\\Assignment\\37-00049_UOF-P_2016_prepped.csv")
#data cleaning
Dallas_2016_ds<-Dallas_2016_ds[-1,]
Dallas_data<- Dallas_2016_ds
The given data set includes 2383 observations and 47 variables
dim(Dallas_data)
## [1] 2383 47
NOTE: In all the below plots , I used the plotly library to convert plot object into an interactive plot that can be zoomed, panned, and hovered over to display additional information.
This bar plot shows the Number of officers based on Gender. It clearly shows that majority of the officers are Male i.e., 2143 and Female officers are 240.
This bar plot shows the Number of subjects based on Gender. It clearly shows that majority of the subjects are Male i.e., 1932 and Female officers are 440.
This bar plot shows the Number of officers based on Race. It clearly shows that the top 3 races of officers include White (1478), Hispanic (482), Black (341). Majority of the officers are belonging to White race.
This bar plot shows the Number of subjects based on Race. It clearly shows that the top 3 races of subjects include Black (1333), Hispanic (524), white (470). Here Majority of subjects are Black.
## Warning: Using an external vector in selections was deprecated in tidyselect 1.1.0.
## ℹ Please use `all_of()` or `any_of()` instead.
## # Was:
## data %>% select(force_cols)
##
## # Now:
## data %>% select(all_of(force_cols))
##
## See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
##
## Female Male
## Baton Display 2 4
## Baton Strike/Closed Mode 0 3
## Baton Strike/Open Mode 0 8
## BD - Grabbed 80 333
## BD - Pushed 29 136
## BD - Tripped 21 47
## Combat Stance 0 7
## Feet/Leg/Knee Strike 11 68
## Hand Controlled Escort 98 213
## Hand/Arm/Elbow Strike 18 106
## Handcuffing Take Down 25 116
## Held Suspect Down 168 623
## Joint Locks 97 278
## K-9 Deployment 0 11
## Leg Restraint System 14 36
## LVNR 0 1
## OC Spray 11 48
## Other Impact Weapon 2 5
## Pepperball Impact 1 1
## Pepperball Saturation 1 3
## Pressure Points 22 105
## Take Down - Arm 77 228
## Take Down - Body 25 184
## Take Down - Group 10 57
## Take Down - Head 3 38
## Taser 19 163
## Taser Display at Person 20 162
## Verbal Command 264 1034
## Weapon display at Person 40 432
This heat map colour ranges from white to red colour states that frequency of males and females from 0 – 1000. It clearly shows that Verbal command is the most used force both and Males (1034) and Females (264).
This pie chart shows the distribution of subjects with respect to their race, the different colours represent different races as mentioned in legends. This plot clearly depicts that 55.9% of Black people participated in the incident. By hovering on the plot, we can see the number of subjects and the rounded percentage of subjects involved in the incidents.
The above plot has two facet grids showing male and female in each of that it shows that how many subjects of particular race are arrested or not arrested. It clearly shows that irrespective of gender more number black people got arrested when compared to the other races. By hovering that we can get the details of count showing that number of subjects arrested.
This plot gives us the clear information about an officer whether he is injured or not based on their race and how many officers are injured or not by hovering on the plot(In this plot i considered only black and white as most of them belongs to these races.)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
It shown that maximum number of incidents occurred in CENTRAL (563) and NORTHWEST has a less number of incidents (191).
## `summarise()` has grouped output by 'SUBJECT_RACE_GROUPED'. You can override
## using the `.groups` argument.
This clearly shows that Verbal command is the mostly used type of force on the subjects irrespective of their race and most of forces are used on subjects of black race.
This scatter shows the Arrest status(Arrested or Not) of the subject based on the district and Reason of incident occurred.
## `summarise()` has grouped output by 'SUBJECT_RACE_GROUPED', 'DIVISION'. You can
## override using the `.groups` argument.
This plot clearly shows the number of incidents ranges from Yellow to Red and different sizes(based on the number of incidents) and colours of circles plotted on the graph shows the incident reason and number of those incidents occurred based on the particular Division and Subject Race. By hovering on the circles can get clear information.
## `summarise()` has grouped output by 'INCIDENT_REASON'. You can override using
## the `.groups` argument.
This clearly tells that the most of incidents occurred in CENTRAL division and it shows the number of incidents occured in each division with a particular incident reason
For Visualizing Time series Data, we need to make sure that all date columns should be in correct date format, if not those should be converted into Date format. We can extract hours , minutes , days , months from the Date and it can be used in depth analysis of the data.In the below time series plots,Month is extracted and used in analysis of the data.
In the Line chart,It clearly seen that at the starting of year, In March maximum number of incidents (264) occurred and at end of year , Less number of incidents occurred in the month of December (100).
## `summarise()` has grouped output by 'INC_MONTH'. You can override using the
## `.groups` argument.
This Line chart shows the races of subjects in a different line colours (shown in the legend).Line of Black race subjects shown in high trend compared to Hispanic and white Races.These both races has a similar trend.
## Warning in geom_point(size = 3, color = "steelblue", aes(text = paste("Month:",
## : Ignoring unknown aesthetics: text
## Warning in geom_smooth(method = "loess", se = FALSE, color = "red", aes(text =
## paste("Month:", : Ignoring unknown aesthetics: text
## Warning: Use of `df_month$INC_MONTH` is discouraged.
## ℹ Use `INC_MONTH` instead.
## Warning: The dot-dot notation (`..y..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(y)` instead.
## ℹ The deprecated feature was likely used in the base package.
## Please report the issue to the authors.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `geom_smooth()` using formula = 'y ~ x'
In this plot, a smoothed line is added using the geom_smooth() function with the method = “loess” parameter. This method estimates the relationship between the variables by fitting a smooth curve through the plotted points using local weighted regression. The smooth line represents the overall trend of the data and provides a visual representation of the relationship between the x and y variables.This method loess is similar to Average. This smoothed line represents the average number of incidents occured in every month
This map shows the Location of the City Dallas where the crimes are happened in the year 2016 mentioned in the given dataset.
## Assuming "longitude" and "latitude" are longitude and latitude, respectively
This Map shows the information by clicking on those red circle markers, it shows in which street the incident occurs and the incident reason.